Content Profiling for Preservation: Improving Scale, Depth and Quality
نویسندگان
چکیده
Content profiling in digital preservation is a crucial step that enables controlled management of content over time. However, large-scale profiling is facing a set of challenges. As data grows and gets more diverse, the only option to control it is to combine outputs of multiple characterization tools to cover the varieties of formats and extract features of interest. This cooperation of tools introduces conflicting measures and poses challenges on data quality. Sparsity and labeling conflicts make it difficult or impossible to partition, sample and analyze large metadata sets of a content profile. Without this, however, it is virtually impossible to manage heterogeneous collections reliably over time. In this paper, we present the content profiling tool C3PO, which includes rule-based techniques and heuristics designed for conflict reduction. We conduct a set of experiments in which we assess the effect of creating such a mechanisms and rule set on the quality and effectiveness of content profiling. The results show the potential of simple conflict reduction rules to strongly improve data quality of content profiling for analysis and decision support.
منابع مشابه
Glow Discharge Depth Profiling a Powerful Analytical Technique in Surface Engineering (TECHNICAL NOTE)
A variety of analytical techniques have been developed and employed to characterize the surfaces, subsurfaces and interfaces of surface engineering systems. They provide important information for quality control, process optimization and further development. Since the mid 1980's, glow discharge spectrometry (GDS) has emerged as an important and versatile technique for rapid depth profiling anal...
متن کاملModel based normalization improves differential expression calling in low-depth RNA-seq
RNA-seq is a powerful tool for gene expression profiling and differential expression analysis. Its power depends on sequencing depth which limits its high-throughput potential, with 10-15 million reads considered as optimal balance between quality of differential expression calling and cost per sample. We observed, however, that some statistical features of the data, e.g. gene count distributio...
متن کاملThe SCAPE Planning and Watch suite Supporting the preservation lifecycle in repositories
Increasingly, content owners are operating repositories with large, heterogeneous collections. The responsibility to provide access to these collections on the long term requires preservation processes such as planning, monitoring, and actual preservation operations such as migration and quality assurance, which have to be managed and integrated with the repositories. This article presents a su...
متن کاملLarge-scale content profiling for preservation analysis
The starting point of any operational endeavor to preserve digital content is gaining a deep understanding of the characteristics of the objects. Systematic analysis of digital object sets and the identification of sample objects that are representative of a collection are critical steps towards preservation operations and a fundamental enabler for successful preservation planning: Without a fu...
متن کاملمقایسۀ دو روش برهمکنش هستهای و پراکندگی تشدیدی برای اندازهگیری نمایۀ عمقی اکسیژن در آلومینای نانو متخلخل
Depth profiling of Oxygen in the surface of materials is important for many oxide elements. In this research two methods of ion beam analysis techniques were used for depth profiling of oxygen in nanoporous anodic Alumina by Nuclear Reaction Analysis (NRA) ( 16O(d, p1)17O ,16O(d, p0)17O) and resonant elastic scattering (RES)( 16O(α, α)16O). By using simulation software, variation of oxygen conc...
متن کامل